Towards Better Decoding and Language Model Integration in Sequence to Sequence Models

نویسندگان

  • Jan Chorowski
  • Navdeep Jaitly
چکیده

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion. In this contribution, we analyse an attention-based seq2seq speech recognition system that directly transcribes recordings into characters. We observe two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used. We propose practical solutions to both problems achieving competitive speaker independent word error rates on the Wall Street Journal dataset: without separate language models we reach 10.6% WER, while together with a trigram language model, we reach 6.7% WER, a state-of-theart result for HMM-free methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Mathematical Model for Cell Formation in CMS Using Sequence Data

Cell formation problem in Cellular Manufacturing System (CMS) design has derived the attention of researchers for more than three decades. However, use of sequence data for cell formation has been the least investigated area. Sequence data provides valuable information about the flow patterns of various jobs in a manufacturing system. This paper presents a new mathematical model to solve a cell...

متن کامل

The Design and Optimization of Distillation Column with Heat and Power Integrated Systems

Based on two integration steps, an optimization framework is proposed in this work for the synthesis and design of complex distillation sequence. The first step is to employ heat integration in sequence and reduce the heat consumption and total annual cost of the process. The second one is to increase the exergetic efficiency of sequence by generating power in implemented expanders in sequence....

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Comparison of Decoding Strategies for CTC Acoustic Models

Connectionist Temporal Classification has recently attracted a lot of interest as it offers an elegant approach to building acoustic models (AMs) for speech recognition. The CTC loss function maps an input sequence of observable feature vectors to an output sequence of symbols. Output symbols are conditionally independent of each other under CTC loss, so a language model (LM) can be incorporate...

متن کامل

A Source-side Decoding Sequence Model for Statistical Machine Translation

We propose a source-side decoding sequence language model for phrase-based statistical machine translation. This model is a reordering model in the sense that it helps the decoder find the correct decoding sequence. The model uses word-aligned bilingual training data. We show improved translation quality of up to 1.34% BLEU and 0.54% TER using this model compared to three other widely used reor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017